Note: the ggplot package is contained within the tidyverse library.
library(tidyverse)
head(diamonds)
| carat | cut | color | clarity | depth | table | price | x | y | z |
|---|---|---|---|---|---|---|---|---|---|
| 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
| 0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
| 0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
| 0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
| 0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |
| 0.24 | Very Good | J | VVS2 | 62.8 | 57 | 336 | 3.94 | 3.96 | 2.48 |
ggplot(data = diamonds)
ggplot(data = diamonds, aes(x = carat, y = price))
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point()
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(aes(colour = cut))
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(aes(colour = cut), size = 2, alpha = .5) +
geom_smooth()
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(aes(colour = cut), size = 2, alpha = .5) +
geom_smooth(aes(fill = cut), colour = "lightgrey")
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(aes(colour = cut), size = 2, alpha = .5) +
geom_smooth(aes(fill = cut), colour = "lightgrey") +
scale_y_log10()
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(aes(colour = cut), size = 2, alpha = .5) +
geom_smooth(aes(fill = cut), colour = "lightgrey") +
scale_y_log10() +
facet_wrap(~cut)
To tidy the preg table use pivot_longer() to create a long table.
preg <- tibble(pregnant = c("yes", "no"),
male = c(NA, 10),
female = c(20, 12))
preg
## # A tibble: 2 x 3
## pregnant male female
## <chr> <dbl> <dbl>
## 1 yes NA 20
## 2 no 10 12
Solution
preg_long <- preg %>%
pivot_longer(cols = c("male", "female"),
names_to = "sex",
values_to = "count")
preg_long
## # A tibble: 4 x 3
## pregnant sex count
## <chr> <chr> <dbl>
## 1 yes male NA
## 2 yes female 20
## 3 no male 10
## 4 no female 12
Change the code below to have the points on top of the boxplots.
ggplot(data = mpg, aes(x = class, y = hwy)) +
geom_jitter() +
geom_boxplot()
Solution
ggplot(data = mpg, aes(x = class, y = hwy)) +
geom_boxplot() +
geom_jitter()
In the diamonds data, clarity and cut are ordinal, while price and carat are continuous.
Create a graphic that gives an overview of these four variables while respecting their types.
One possible plot, there will be many!
data(diamonds)
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth(aes()) +
facet_grid(~cut)
The movies data set contains information from IMDB.com including ratings, genre, length in minutes, and year of release. Explore the differences in length, rating, etc. in movie genres over time. Hint: use faceting!
A few different plots, there will be many!
movies <- read.csv("https://srvanderplas.github.io/rwrks/02-r-graphics/data/MovieSummary.csv")
summary(movies)
## X title year length
## Min. : 7 Length:65134 Min. :1893 Min. : 1.00
## 1st Qu.:144108 Class :character 1st Qu.:1954 1st Qu.: 24.00
## Median :195321 Mode :character Median :1983 Median : 89.00
## Mean :208093 Mean :1975 Mean : 73.36
## 3rd Qu.:258227 3rd Qu.:1998 3rd Qu.:100.00
## Max. :411511 Max. :2005 Max. :873.00
##
## budget rating votes mpaa
## Min. : 0 Min. : 1.000 Min. : 5 Length:65134
## 1st Qu.: 320000 1st Qu.: 5.300 1st Qu.: 12 Class :character
## Median : 4000000 Median : 6.300 Median : 32 Mode :character
## Mean : 15489887 Mean : 6.138 Mean : 768
## 3rd Qu.: 20000000 3rd Qu.: 7.100 3rd Qu.: 131
## Max. :200000000 Max. :10.000 Max. :157608
## NA's :58713
## genre
## Length:65134
## Class :character
## Mode :character
##
##
##
##
ggplot(movies, aes(x = year, y = budget, group = genre, color = genre)) +
geom_point()
ggplot(movies, aes(x = year, y = length, group = genre, color = genre)) +
geom_smooth()
ggplot(movies, aes(x = budget, y = rating, color = genre, group = genre)) +
geom_point() +
geom_smooth() +
facet_wrap(~mpaa)
ggplot(movies, aes(x = log(budget + 1), y = rating, color = genre, group = genre)) +
geom_point() +
geom_smooth()
ggplot(movies, aes(x = genre, fill = mpaa)) +
geom_bar()
ggplot(movies, aes(x = rating, group = mpaa, fill = mpaa)) +
geom_density(alpha = .4) +
facet_wrap(~genre, nrow = 2)
install.packages("palmerpenguins")
data(penguins, package = "palmerpenguins")
head(penguins)
| species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
|---|---|---|---|---|---|---|---|
| Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
| Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
| Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
| Adelie | Torgersen | NA | NA | NA | NA | NA | 2007 |
| Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | female | 2007 |
| Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | male | 2007 |
Meet the Palmer penguins & Bill Dimensions by Allison Horst
bill length versus bill width from the penguins data, colored by speciesp0 <- ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point()
p0
p1 <- p0 +
theme_bw()
p1
p2 <- p1 +
scale_x_continuous("Bill Length (mm)") +
scale_y_continuous("Bill Depth (mm)") +
ggtitle("Palmer Penguins", subtitle = "Bill Size")
p2
p3 <- p2 +
scale_color_viridis_d("Species")
p3
p4 <- p3 +
theme(legend.position = "bottom",
aspect.ratio = 1)
p4
Make sure you know where this is saving to; remember R projects and working directories!
ggsave(filename = "penguins.pdf", plot = p4)
ggsave(filename = "diamonds.png", plot = p4)